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Abstract 

Grouped data are commonly encountered in applications. All data from a contin¬ 
uous population are grouped due to rounding of the individual observations. The 
Bernstein polynomial model is proposed as an approximate model in this paper for 
estimating a univariate density function based on grouped data. The coefficients of 
the Bernstein polynomial, as the mixture proportions of beta distributions, can be 
estimated using an EM algorithm. The optimal degree of the Bernstein polynomial 
can be determined using a change-point estimation method. The rate of conver¬ 
gence of the proposed density estimate to the true density is proved to be almost 
parametric by an acceptance-rejection arguments used in Monte Carlo method. The 
proposed method is compared with some existing methods in a simulation study 
and is applied to the Chicken Embryo Data. 

KEYWORDS:Acceptance-rejection method. Approximate model, Bernstein Type 
polynomials; Beta Mixture, Change-point, Density estimation; Grouped data; Model 
selection; Nonparametric model; Parametrization; Smoothing. 


1 Introduction 


In real world applications of statistics, many data are provided in the form of fre¬ 
quencies of observations in some fixed mutually exclusive intervals, which are 
called grouped data. Strictly speaking, all the data from a population with a con¬ 
tinuous distribution are grouped due to rounding of the individual observations 
(Hall 19821. The EM algorithm has been used to deal with grouped data (|Demp¬ 


ster et al. 19771. McEachlan & Jones (1988) introduced the EM algorithm for 
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fitting mixture model to grouped data (see Jones & McLachlan 1990 also). Under 
a parametric model, let f{x; 9) be the probability density function (PDF) of the un¬ 
derlying distribution with an unknown parameter 9. The maximum likelihood esti¬ 
mate (MLE) of the parameter 9 can be obtained from grouped data and is shown to 
be consistent and asymptotically normal (see, for example, |Lindley[ 1950 Tallis 


19671. Parametric MLE is sensitive to model misspecification and outliers. The 


minimum Hellinger distance estimate (MHDE) of the parameter using grouped 
continuous data is both robust for contaminated data and asymptotically efficient 
(Beran 1977a|b ). Parametric methods for grouped data requires evaluating in¬ 
tegrals which makes the computation expensive. To lower the computation cost 
Lin & H^(2006 1 proposed the approximate minimum Hellinger distance estimate 


(AMHDE) for grouped data by the data truncation and replacing the probabilities 
of class intervals with the first order Taylor expansion. Clearly their idea works for 
MLE based on grouped data. 

Under nonparametric setting, the underlying PDL / is unspecified. Based on 
grouped data / can be estimated by the empirical density, the relative frequency 
distribution, which is actually a discrete probability mass function. The kernel 
density estimation ( Rosenbl^ [1956 1971| ) can be applied to grouped data (see 
Linton & Whang[ 2002| Jang & Loh 2010 Minoiu & Reddy[ 2014[ for exam¬ 


ple). The effects of rounding, truncating, and grouping of the data on the kernel 
density estimate have been studied, maybe among others, by [Hall (1982), Scott 


& Sheather (1985), and Titteringt^ ( 1983]). However, the expectation of kernel 


density estimate is the convolution of / and the kernel scaled by the bandwidth. It 
is crucial and difficult to select an appropriate bandwidth to balance between the 
bias and variance. Many authors have proposed different methods for data-based 


bandwidth selection over the years. The readers are referred to a survey by Jones 


et al.|((T996|) for details and more references therein. Another drawback of the ker¬ 


nel density is its boundary effect. Methods of boundary-effect correction have been 
studied, among others, by Rice (1984) and Jones ( |1993 ). 

“All models are wrong”(Box 1976). So all parametric models are subject to 
model misspecification. The normal model is approximate because of the central 
limit theorem. The goodness-of-fit tests and other methods for selecting a paramet¬ 
ric model introduce additional errors to the statistical inference. 

Any continuous function can be approximated by polynomials. Vitale (1975) 
proposed to estimate the PDL / by estimating the coefficients f{i/m) of the Bern¬ 
stein polynomial (Bernstein 1912) B/(t) = 

f{i/m) = {m + l){Fn[{i + l)/{m + l)]-Fn[i/{m + l)]}, i = 0,1,... ,m, where 
Fn is the empirical distribution function of xi,..., Since then, many authors 
have applied the Bernstein polynomial in statistics in similar ways (seelGuanl 12014 
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for more references). These and the kernel methods are not model-hased and not 
maximum likelihood method. Thus they are not efficient. The estimated Bernstein 
polynomial IB/(f) = “ f)™"* aims at B/(f). It is known 

that the best convergence rate of to f{t) is at most 0(m“^) if / has contin¬ 
uous second or even higher order derivatives on [0,1]. Buckland ( 1992| l proposed 
a density estimation with polynomials using grouped and ungrouped data with the 
help of some specified parametric models. 

Thanks to a result of Lorentz (19631 there exists a Bernstein (type) polynomials 

fm{t]p) = Y."LoPniiPmi{t), whcrC ^ 0, = (m + - t)^~\ 

i = 0, ... ,m, whose rate of convergence to f{t) is at least if / has a 

continuous r-th derivative on [0,1] and r ^ 2. This is called a polynomial with 
“positive coefficients” in the literature of polynomial approximation. Guan (2014i 
introduced the Bernstein polynomial model fm{t',p) as a globally valid approxi¬ 
mate parametric model of any underlying continuous density function with support 
[0,1] and proposed a change-point method for selecting an optimal degree m. It has 
been shown that the rate of convergence to zero for the mean integrated squared 
error(MISE) of the maximum likelihood estimate of the density could be nearly 
parametric, 0(n“^+'^), for all e > 0. This method does not suffer from the bound¬ 
ary effect. 

If the support of / is different from [0,1] or even infinite, then we can choos e 


appropriate (truncation) interval [a, b] so that f{x)dx 1 (see 


Guan 


20141. 


an 

Therefore, we can treat [a, b] as the support of / and we can use the linearly trans¬ 
formed data Hi = {xi — a)/{b — a) in [0, 1] to obtain estimate g of the PDF g of 
Pi’s, respectively. Then we estimate / by f{x) = g{{x — a)/(b — a)}/(b — a). In 
this paper, we will assume that the density / has support [0,1]. 

This Bernstein polynomial model fm{t',p) is a finite mixture of the beta den¬ 
sities /3mi(f) of beta(f -|- 1, m — z -|- 1), f = 0,..., m, with mixture proportions 
P = {PmO, ■ ■ ■ ,Pmm)- It has been shown that the Bernstein polynomial model 
can be used to fit a ungrouped dataset and has the advantages of smoothness, ro¬ 
bustness, and efficiency over the traditional methods such as the empirical distri¬ 
bution and the kernel density estimate (Guan 20141. Because these beta densities 
and their integrals are specified and free of unknown parameters, this structure of 
fm{t] p) is convenient. It allows the grouped data to be approximately modeled by 
a mixture of m -|- 1 specific discrete distributions. So the infinite dimensional “pa¬ 
rameter” / is approximately described by a finite dimensional parameter p. This 
and the nonparametric likelihood are similar in the sense that the underlying distri¬ 
bution function is approximated by a step function with jumps as parameters at the 
observations. 

Due to the closeness of fm{t',p) to f{t), by the acceptance-rejection argu- 
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ment for generating pseudorandom numbers, almost all the observations in a sam¬ 
ple from f{t) can be used as if they were from fm{t',p)- It will be shown in 
this paper that the maximizer of the likelihood based on the approximate model 
fm{t',p) targets po which makes fm{t]Po) the unique best approximation of /. 
This acceptance-rejection argument can be used to prove other asymptotic results 
under an approximate model assumption. 

In this paper we shall study the asymptotic properties of the Bernstein polyno¬ 
mial density estimate based on grouped data and ungrouped raw data as a special 
case of grouping. A stronger result than that of |Guan ( 2014| l about the rate of con¬ 
vergence of the proposed density estimate based on ungrouped raw data will be 
proved using a different argument. We shall also compare the proposed estimate 
with those existing methods such as the kernel density, parametric MLE, and the 
MHDE via simulation study. 

The paper is organized as follows. The Bernstein polynomial model for grouped 
data is introduced and is proved to be nested in Section The EM algorithm for 
finding the approximate maximum likelihood estimates of the mixture proportions 
is derived in this section. Some asymptotic results about the convergence rate of 
the proposed density estimate are given in Section The methods for determining 
a lower bound for the model degree m based on estimated mean and variance and 
for choosing the optimal degree m are described in Section In Section the 
proposed methods are compared with some existing competitors through Monte 
Carlo experiments, and illustrated by the Chicken Embryo Data. The proofs of the 
theorems are relegated to the Appendix. 


2 Likelihood for grouped data and EM algorithm 


2.1 The Bernstein polynomial model 

Eet [0,1] be the class of functions which have r-th continuous derivative 
on [0,1]. Eike the normal model being backed up by the central limit theorem, 
the Bernstein polynomial model is supported by the following mathematical result 
which is a consequence of Theorem 1 of Eorentz| ( 19631. We denote the m-simplex 
by 

m 

— ^P — (PmO) • • • ■ Pmj ^ 0; ^ ^ Pmj — 1^- 

i=o 


Theorem 1 . If f £ f{t)dt = 1, and f{t) ^ 6 > 0, then there 

exists a sequence of Bernstein type polynomials fmit',p) = Y^'iLoPmifimiit) with 
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p G Sm. such that 


\fit) - ^ 


( 1 ) 


where Am{t) = max{m — t)/m} and the constant C{r,6,f) depends 

on r, 5, maxt \ f{t)\, and maxf \ i = 2,... ,r, only. 


The uniqueness of the best approximation was proved by |Passow (19771. Let 


(2014 

mode 


/ be the density of the underlying distribution with support [0,1]. We approximate 
/ using the Bernstein polynomial fm{t]p) = Yl]LoPmj/3mjit), whe re p £ S^. 

Define = {/m(f;p) = T,T=oPmj^mj{t) : p G Sm}- |Guan 

showed that, for all r ^ 1, Dm C Dm+r- So the Bernstein polynomia _ 

fm{t]p) of degree m is nested in all Bernstein polynomial models of larger de¬ 
grees. 

Let [0,1] be partitioned by N class intervals {(L_i, fj] : f = 1,..., A'^}, where 
0 = fo < ft < • • • < = 1- The probability that a random observation falls in 

the f-th interval is approximately 


rti rn 

— / fmilj P)dt — ^ ^ CLjjPmj ; 
7q-i 


( 2 ) 


where aij = Dmj{U) — Dmj{ti-i), i = l,...,iV, Dmj{t) is the cumulative 
distribution function (CDF) of beta(y + 1, m — j' + 1), y = 0,1,..., m, and 


N 


^ ^ dmijp) — ^ ^, Pmj — 1 - 
j=0 


2=1 


So the probability 9mi{p) is a mixture of a specific components {ojo, ■ ■ • ,Oim} 
with unknown proportions p = {pmo, ■ ■ ■ ,Pmm)- 

By Theorem 2-1 of Guan| ( [20T4) ), the above Bernstein polynomial model (|^ of 
degree m for grouped data is nested in a model of degree m + r, i.e., for all r ^ 1, 


m 

( I'r 


rti 


9m.i — 


^ ^^ Pmj/3mj{t)dt — / ^ ^ Pm+r,jPm+r,jit')dt — 6m+r,ii i — Ij ■ ■ • ) Af. 


'h-i y=o 


'U-i 


2.2 The Bernstein likelihood for grouped data 

In many applications, we only have the grouped data {rij, fj] : f = 1,..., N} 
available, where 0 = fo < ft < • • • < = 1 and n* = #{y G (1,..., n) : xj G 
{ti-i,ti]}, i = 1,..., A^, and xi,..., is a random sample from a population 
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having continuous density f{x) on [0,1]. Our goal is to estimate the unknown 
PDF /. The loglikelihood of (ni,..., n]\f) is approximately 


N m 

i=l j={) 


(3) 


where the mixture proportions p = (pmOj • • • ,Pmm) are subject to the feasibility 
constraints p G Sm- For the ungrouped raw data xi,..., Xn, the loglikelihood is 

n m 

(-Rip) = { '^Prajlimj{Xi)Y (4) 

i=l j=0 

If we take the rounding error into account when the observations are rounded to 
the nearest value using the round half up tie-breaking rule, then 


^g(p) 


oo m 

^ ^ Tlj log ^ ^ Pmj(fji) l)} j 

i=—oo j=0 


(5) 


where ti = (i + ll2)lK, z = 0, ±1, ±2,..and iF is a positive integer such that 
any observation is rounded toi/K for some integer i. 

We shall call the maximizers pc and pR of icip) and inip) the maximum 
Bernstein likelihood estimates (MBLE’s) of p based on grouped and raw data, re¬ 
spectively, and call /s(f) = fm{t-,PG) and /^(f) = fm{t;PR) the MBLE’s of / 
based on grouped and raw data, respectively. 

It should also be noted that as —)■ oo and max{Afj = f, — tj_i : i = 

1,..., A} —)■ 0 the above loglikelihood Q reduces to the loglikelihood Q for un¬ 
grouped raw data. Specifically, lirn ma ^ {^g{p) - Eill log = ^r{p)- 
If the underlying PDE / is approximately fm{t;p) = YlT=oPmil3mi{t) for 
some m ^ 0, then the distribution of the grouped data (ni,..., tin) is approxi¬ 
mately multinomial with probability mass function 


P{Wi = m,..., Wn = Rn) 



The MLE’s of 9mi^ are Omi = = 1, • • •, So the MLE’s prnj of Pmj satisfy 

the equations YJj=o aijPmj = * = 1,..., A^, and {pmo, ■ ■ ■ ,Pmm) e Sm- 

Becausepmo = 1 - YJf=iPmj, Pmj satisfy equatins 

m 

^ (^io)Pmj — Q.j0) i 

i=l ^ 


6 






and inequality constraints pmj ^ 0, j = 1,... ,m, and ^ 1. It seems 

not easy to algebraically solve the above system of equations with inequality con¬ 
straints. In the next section, we shall use an EM-algorithm to find the MLE of 
P- 


2.3 The EM Algorithm 

Eet 5ij = 1 or 0 according to whether or not Xi was from beta(j = 

1,... ,n,j = 0,..., m. We denote by Zj = {zn ,..., the vector of indicators 
Zij = I{xi G {tj-i,tj]}, i = 1,... ,n, j = I,..., N. Then the expected value of 
5ij given Zj is 




Praj 

Yl'h=0Pnih 


Note that J2]Lo the observations are ni = YYi=i '^j^o — 

YYi=i ^ = 1, • • •, A. The likelihood of 5ij and z* is 


n m N 

■^c{p) = n n 


i=lj=0 


1=1 


The loglikelihood is then 


n m 


Y{p) — Sij 

i=l j=o 


N 

logpmj + ^ Zil log{‘Bmj{ti) - 
l=l 


E-Step Given we have 


Q{p,p^"^) 


^pm{Yp)\z} 

n m 

i=l j=0 


N 

logpmj + ^ Zil log{T>mj{tl) - 
1=1 


M-Step Maximizing Q{p, with respect to p subject to constraint p £ we 
have, for s = 0,1,..., 


P'. 


(^+1) _ 


mj 


-„trAP< 


5) 


2=1 


1 nip^^^j{‘Bmj(tl) - 'Bmj(tl-l)} 

” h ZLoPi^ii^mhitl) - ^mh(tl-l)}' 


1 



Starting with initial values j = 0,..., m, we can use this iterative formula to 
obtain the maximum Bernstein likelihood estimate pc- If the ungrouped raw data 
xi,... ,Xn are available, then the iteration (Guan[ 20141 is reduced to 


p, 


(^+1) _ 


mj 


1 


n 


E 


PmjPmj{Xi) 


=1 Ylh=oPrnhl^rnhixi 


j = 0, 


, m; 


s = 0,1, 


(V) 


The following theorem shows the convergence of the EM algorithm and is proved 
in the Appendix. 

Theorem 2. (i) Assume p'^- > 0, j = 0, 1, ..., m, an'f ~ Then 

as s ^ oo, converges to the unique maximizer p^ of Ir{p). (ii) Assume 
Prnj > 0. j = 0, 1, ..., m, and Pm] = 1- Then as s ^ oo, converges to 
the unique maximizer pc oficip). 


3 Rate of Convergence of the Density Estimate 


In this section we shall state results about the convergence rate of the density esti¬ 
mates which will be proved in the Appendix. Unlike most asymptotic results about 
maximum likelihood method which assume exact parametric models, we will show 
our results under the approximate model p) = YljLo Pmjfimj (f)- For a given 
Po, we define the norm 


bill = 


{Ut-p)Y 

Ut-po) 


The squared distance between p and po with respect to norm || • b is 


\\P-Po\\b = 


{/m(f;p) - /m(f;Po)}^ , 
fm{t-,Po) 


With the aid of the acceptance-rejection argument for generating pseudoran¬ 
dom numbers in the Monte Carlo method we have the following lemma which 
may be of independent interest. 

Lemma 3. Let f G l]for some positive integer k, f{t) ^ 6 > 0, and 

fm{t) = /m(f;Po) be the unique best approximation of degree mfor f. Then a 
sample xi,..., Xnfrom f can be arranged so that the first observations can be 
treated as if they were from fm. Moreover, for all p such that fm{xj]p) ^ (5' > 0, 
j = 1,... ,n, 

n 

(■r{p) = fm{Xi]p) = b(p) +-Rmn, 

i=\ 

8 


( 8 ) 







where £r{p) = Yli=i log fmixi] p), and 

am = n — 0{nm~^) — 0 ^\/ nm~^ log log , a.s., 

\Rmn\ = 0{nm~^) + 0 ^-\/ nm~^ log log 


a.s.. 


(9) 

( 10 ) 


Remark 3.1. So Ir{p) is an “exact” likelihood of xi,... ,Xi,^ while ^r{p) = 
Yll=i log fm{xi] p) is an approximate likelihood of the complete data xi,..., Xn 
which can be viewed as a slightly contaminated sample from fm- Maximizer p of 
iR{p) approximately maximizes (■r{p). Hence fmif p) targets at fm{t', Po) which 
is a best approximate of f. 


For density estimation based on the raw data we have the following result. 

Theorem 4. Suppose that the PDF f G 1] for some positive integer k, 

f{t) ^ 6 > 0, and m = 0(n^(^). As n ^ oo, with probability one the maximum 
value of I!.r{p) is attained by some Pr in the interior ofMmirn) = {p ^ Sm : 
Up “ Polls ^ J’n}’ where = \ogn/^/n andpo makes /m(-;Po) the unique best 
approximation of degree m. 

Theorem 5. Suppose that the PDF f G 1] for some positive integer k, 

f{f) ^ 6 > 0, and 0 < c^vf!^ ^ m ^ cYofl^ < oo. Then there is a positive 
constant C such that 


^ f {fm{t;PR) - 

— m — 




(log n) 


n 


( 11 ) 


Because f is bounded there is a positive constant C such that 
MISE(/b) = E [ {fmifPR) - fm^dt ^ C 


(logn)^ 


n 


( 12 ) 


Note that ( [TT] ) is a stronger result than ( [T^ which is an almost parametric rate 
of convergence for MISE. Guan ( 2014[ l showed a similar result under another set 
of conditions. The best parametric rate is 0(n“^) that can be attained by the para¬ 
metric density estimate under some regularity conditions. 

For 9mi{p) = Y.T=oPmj{'R7nj{ti) “ 'BmjiU-i)}, wc define norm 


||„||2 _ OUp ) 

WPWg — 


i=l ^™*(Po) 


The squared distance between p and po with respect to norm || • ||g is 


IIp-PoIIg = 


N 

E 

2 = 1 


{^mi(p) ^mi(Po)} 

9mi(po) 
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By the mean value theorem, we have 




'U-1 


where Ati = U — U-i and G tj]. Thus ||p — Po\\g is a Riemann sum 


N 1 

IIP-PoIIg = ~ / Mt)dt=\\p-po\\B, (13) 

*=1 do 


where 


V’m(i) = 




For grouped data we have the following. 

Theorem 6. Suppose that the PDF f G 1] for some positive integer k, 

f{t)^6>0, and m = As n ^ oo, with probability one the maximum 

value of iG{0m) is attained at pc in the interior ofMm{rn) = {p ^ Sm : \\p — 
PoWg ^ ^n}> where Vn = \ogn/^/n and po makes /m(sPo) the unique best 
approximation. 

For the relationship between the norms \\p — Poll^ \\P ~ PoIIg’ 
the following result. 

Theorem 7. Suppose that the PDF f G 1] for some positive integer k, 

and f{t) ^ S > 0. Let po G Sm be the one that makes fm{-]Po) the unique best 
approximation of f. Then for all p G Sm, we have 

l|p-Po||| = IIp-PoIIg + 


For a grouped data based estimate pG, the rate of eonvergenee of \\pG — Poll g 
to zero is O((logn)^/n). However the rate of eonvergenee of ||pg — Polls 
zero depends on that of maxj Afj. For equal-width elasses, Af = l/N, and 
N = we have ||pG ~ Polls = O((logn)^/n) + Thus 

IIpg — Polls = O((logn)^/n) if m = 0(n^/^). If k is large, then N ^ ^/n. 

Theorem 8. Suppose that the PDF f G 1] for some positive integer k, 

f{t)^5>0, and 0 < con^/^ ^ m ^ c\vi}l^ < oo. Then we have 


f {fmifPc) - f{t)y 

J m 


O((logn)^/n) + 0(m^ maxAff). (14) 

i 


.,m, 
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Also, because f is bounded, 


MISE(/,j) 


E j{fm{t;PG) - 
O((logn)^/n) + 0{m‘^ maxAtj). 


(15) 


4 Model Degree Selection 


Guan ( 2014| l showed that the model degree m is bounded below approximately by 
rrif, = max {l, \p{l — ~ 3] }■ Based on the grouped data, the lower bound 

nib can be estimated by rfib = maxjl, — /i)/cr^ — 3] }, where 


N N 1 / ^ 

5-2 =- T^riiit* -fif = -- I -nfl 

n n — 1 n — 1 \ 

2 = 1 2=1 \ 2=1 


d = 


t* = {ti-i+ti)/2, i = 

Due to overfitting the model degree m cannot be arbitrarily large. With the 
estimated rhb, we choose a proper set of nonnegative consecutive integers, M = 
{mo, mo + 1,..., mo + k] such that mo < mb- Then we can estimate an opti¬ 


mal degree m using the method of change-point estimation as proposed by Guan 


(20141. For each m, = mo -|- i we use the EM algorithm to find the MBLE 
and calculate = £{pmi)- Let yi = £i — £i-i, i = 1, ... ,k. The yi’s are non¬ 


negative because the Bernstein polynomial models are nested. Guan (20141 sug¬ 
gested that yi,... ,yT be treated as exponentials with mean yi and yr+i, ■ ■ ■ ,yk 
be treated as exponentials with mean yQ, where yi > yo, so that r is a change 
point and m,- is the optimal degree and use the change-point detection method (see 
Section 1.4 of Csdrgo & Horvath] 19971 for exponential model to find a change- 

Specifically, 


point estimate r. Then we estimate the optimal m by m = 
f = argmaxi,gT-^fc{i((r)}, where the likelihood ratio of r is 


nif 


R{t) = k log 


^°^-rlog f )-(/c-r)log 


k — T 


T = l,...,k. 


If R{t) has multiple maximizers, we choose the smallest one as r. 


5 Simulation Study and Example 

5.1 Simulation 

The distributions used for generating pseudorandom numbers and the parametric 
models used for density estimation are as following. 
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(i) Uniform(0,l): the uniform distribution with /i = 1/2 and = 1/12 as a 
special beta distribution beta(l,l). The parametric model is the beta distri¬ 
bution beta(a, 1). 

(ii) Exp(l): the exponential distribution with mean /r = 1 and variance cr^ = 1. 
We ttuncate this distribution by the interval [a, 6] = [0,4]. The parametric 
model is the exponential distribution with mean = 9. 

(iii) Pareto(4, 0.5): The Pareto distribution with shape parameter a = 4 and 
scale parameter xq = 0.5 which is treated as known parameter. The mean 
and variance are, respectively, ^ = axQ/{a — l) = 2/3 and cr^ = XQa/[{a — 
l)^(a — 2)1 = 1/18. We truncate this distribution by the interval [a, b] = 
[xo, Ai + 4it] rs [0.5, 1.6095]. The parametric model is Pareto(Q;, 0.5). 

(iv) NN(A;): the nearly normal distribution of Uk = (rti + • • • + Uk)/k with 
ui,... ,Uk being independent uniform(0,l) random variables. The lower 
bound is mi, = 3{k — 1). We used the normal distribution N(/r, cr^) as 
the parametric model. 

(v) N(0,1): the standard normal distribution truncated by the interval [a, b] = 
[—4,4]. The parametric model is cj^). 

(vi) Logistic(0, 0.5): the logistic distribution with location ^ = 0 and scale s = 
0.5 so that (T^ = (s7r)^/3 = 7r^/12. We truncate this distribution by the 
interval [a,b] = [n — 4a, ^ + 4cr] [-2.9619, 2.9619]. The parametric 
model is LogisticC/r, s). 


Except the normal distribution, the above parametric models were chosen for 
the simulation because the CDE’s have close-form expressions so that the expen¬ 
sive numerical integrations can be avoided for the MHDE and the MEE. 

Erom each distribution we generated 500 samples of size n = 50,100, 200, and 
500 and the grouped data using N = 5, 10,10 and 20 equal-width class intervals, 
respectively. The model degree m were selected using the change-point method 
from {1, 2,..., 40}. 


Erom the results of Guan ( |2014 1 we see that the Bernstein polynomial method 
is much better than the kernel density for ungrouped data. The AMHDE is approx¬ 
imation for MHDE. So we only compare kernel, the MEE, the MHDE, and the 
proposed MB EE. Eor the kernel density estimate /ic(x) = ^ iT' 


we used normal kernel K(x) = e and the commonly recommended 

method of |Sheather & Jones (19911 to choose the bandwidth h. Because E[/x(x)] = 

^ K ( ) fiy)dy = {Kfi * f) (x). This is the convolution of / and the scaled 
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kernel Kh{-) = K{-/h)/h. So no matter how the bandwidth h is chosen, there is 
always trade-off between the bias and the variance. 

Table [TJpresents the simulation results of the density estimations. As expected, 
the proposed Bernstein polynomial method performs much better than the kernel 
density method and is similar to the other two parametric methods. Table [T] also 
shows the estimated mean and variance of the optimal model degree selected by 
the change-point method. It seems that the performance of the estimated optimal 
model degree m is satisfactory. 

It should be noted that the density of NN(A:) satisfies V’fc(f) £ [0,1] 

but ^ 1] for A: ^ 2. In fact, when. A: ^ 2, V'fc(t) is a piece- 

wise polynomial function of degree {k — 1) defined on pieces [i/k, {i -|- 1)/A:), 
i = 0,1,..., A; — 1. Except NN(k) all the other population densities have con¬ 
tinuous derivatives of all orders on their supports. In the simulation, we used the 
normal distributions as the parametric models of NN(4). Here both the normal and 
the Bernstein polynomial are approximate models . In fact, in most applications 
the normal distribution is an approximate model due to the central limit theorem. 
We did a simulation on the goodness-of-fit of the normal distribution to the sample 
from NN(4). In this simulation, we generate 5,000 samples of size n from NN(4). 
We ran the Kolmogorov-Smimov test for each sample. For n = 50,100, 200, and 
500 the average of the p-values are, respectively, 0.7884, 0.7875, and 0.7470; and 
the numbers of p-values among the 5000 that are smaller than 0.05 are, respec¬ 
tively, 3, 2, 0, and 2. So the normal distribution will accepted as the parametric 
model for NN(4) almost all the time. The performance of the proposed MBLE for 
samples from NN(4) is even better than that of the MEE when sample size n is 
small. 


5.2 The Chicken Emhryo Data 


The chicken embryo data contain the number of hatched eggs on each day during 
the 21 days of incubation period. The times of hatching (n = 43) are treated as 
grouped by intervals with equal width of one day. The data were studied first by 
Jassim et al. (19961. Kuurman et al. (20031 and |Ein & He (20061 also analyzed 
the data using the MHDE, in addition to other methods assuming some parametric 
mixture models including Weibull model. The latter used the AMHDE to fit the 
data by Weibull mixture model. The estimated density using the proposed method 
is close to the parametric MEE. 

Applying the proposed method of this paper, we truncated the distribution us¬ 
ing [a, b] = [0, 21] and selected the optimal model degree m = 13 from {2,3,..., 50} 
using the change-point method. Figure[^displays the loglikelihood i{m), the like- 
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Table 1: Estimated mean and varianee of rh, and mean integrated squared errors 
(MISE’s) of the kernel density fx, the MEE /ml> the MHDE fuuD, and the pro¬ 
posed maximum Bernstein likelihood estimate (MBEE) /b based 500 simulated 
samples of size n whieh are grouped by N equal-width elass intervals, respee- 
tively. 

MISE 



E(m) 

var(m) 

/b 

/k 

/ml 

/mhd 

Beta(l,l) 

14.91 

3.95 

n = 50, 
0.3898 

N = 5 

2.5722 

0.0193 

0.0222 

Exp(l) 

14.56 

9.14 

0.0447 

0.7502 

0.0018 

0.0034 

Pareto 

12.29 

29.01 

0.7855 

19.6962 

0.0392 

0.0793 

NN(4) 

12.04 

15.42 

0.0556 

8.5549 

0.0653 

0.103 

N(0,1) 

14.25 

11.10 

0.0007 

0.1603 

0.0008 

0.0012 

Eogistie 

13.79 

19.27 

0.0022 

0.2689 

0.0014 

0.0024 

Beta(l,l) 

12.96 

47.08 

n = 100, 
0.0972 

iV = 10 

0.0558 

0.0096 

0.0118 

Exp(l) 

9.42 

37.24 

0.0091 

0.2377 

0.0011 

0.0027 

Pareto 

8.51 

9.78 

0.1009 

18.6903 

0.0222 

0.0613 

NN(4) 

10.24 

6.72 

0.0217 

1.4357 

0.0232 

0.0411 

N(0,1) 

13.77 

6.58 

0.0004 

0.0357 

0.0003 

0.0005 

Eogistie 

12.02 

13.42 

0.0012 

0.0992 

0.0007 

0.0012 

Beta(l,l) 

15.88 

48.93 

n = 200, 

0.0741 

iV = 10 

1.2338 

0.0045 

0.0051 

Exp(l) 

9.24 

41.36 

0.0068 

0.4547 

0.0007 

0.0017 

Pareto 

8.63 

9.85 

0.0661 

34.3823 

0.0123 

0.0323 

NN(4) 

10.11 

3.92 

0.0128 

4.0956 

0.0125 

0.0213 

N(0,1) 

14.08 

5.74 

0.0003 

0.0907 

0.0002 

0.0003 

Eogistie 

13.15 

14.07 

0.0007 

0.1936 

0.0004 

0.0006 

Beta(l,l) 

10.18 

49.84 

n = 500, 

0.0192 

iV = 20 

0.0226 

0.0021 

0.0024 

Exp(l) 

4.40 

3.55 

0.0006 

0.2331 

0.0005 

0.0015 

Pareto 

8.67 

1.94 

0.0181 

16.0924 

0.0083 

0.0253 

NN(4) 

9.97 

2.75 

0.0059 

0.5994 

0.0058 

0.0110 

N(0,1) 

14.41 

2.94 

0.0001 

0.0329 

0.0001 

0.0001 

Eogistie 

13.26 

5.15 

0.0003 

0.0905 

0.0001 

0.0003 
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0 5 10 15 20 


Day 


Figure 1: Upper panel left: the loglikelihood £(m); Upper panel right: the like¬ 


lihood ratio R{t) for change-point of the increments of the loglikelihoods £{m)', 


Lower panel: the density estimates for the chicken embryo data. 
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lihood ratio i?(r) for change-points, the histogram of the grouped data and the 
kernel density Jk, the MLE Jml, the MHDE /MHD^ the AMHDE /AMHD^ and 
the proposed maximum Bernstein likelihood estimate (MBEE) /b- Erom this fig¬ 
ure we see that the proposed MBEE /b and the parametric MEE /ml are similar 
and fit the data reasonably. The kernel density is clearly not a good estimate. The 
AMHDE /amhd seems to have overestimated / at numbers close to 0. 


6 Concluding Remarks 

The proposed density estimate fm{t',p) has obviously considerable advantages 
over the kernel density: (i) It is more efficient than the kernel density because 
it is an approximate maximum likelihood estimate; (ii) It is easier to select an opti¬ 
mal model degree m than to select an optimal bandwidth h for the kernel density; 
(hi) The proposed density estimate aims at fm{t]po) which is the best 

approximate of / for each m, while the kernel density fx aims at / * Kh, the 
convolution of / and Kx 

Another significance of this paper is the introduction of the acceptance-rejection 
argument in proving the asymptotic results where an approximate model is as¬ 
sumed which is new to the knowledge of the author. 


A Proofs 

A.l Proof of Theorem [T] 

., Mr) as the class of functions (j){t) on 
..., r, exist and are continuous with the 

properties 


Proof. We define = Ar((5, Mq, M 2 , ■ ■ 
[0,1] whose first r derivatives i = 1, 


6 ^ ^ Mo, {t)\ ^ Mi, 2^i^r, (16) 
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for some 5 > 0, Mi > 0, i = 0, 2,..., r. A polynomial of degree m with “positive 


coefficients” is defined by 


Lorentz 


(19631 as ct>m{t) = 


where q ^ 0, i = 0,..., m. Theorem 1 of Lorentz ( 1963| l proved that for given 
integers r ^ 0, 5 > 0, and positive constants Mi ^ 0, f = 0, 2, 3,..., r, then there 
exists a constant Cr = Cr{5, Mq, M 2 , ..., Mr) such that for each function (j) G 
Ar((5, Mq, M 2 ,..., Mr) one can find a sequence (/)m, m = 1, 2,..of polynomials 
with positive coefficients of degree m such that 

\4){t) - (f>mit)\ ^ CrAl^{t)u}{Amit),(j)^'''>), 0 < f ^ 1, (17) 

where a;(5, 0) = sup|,^_y |<5 \(j){x) - (p{y)\. 

Under the conditions of Theorem [T| we see that Mq = max* f{t), Mi = 
maxj 1/^®^ (f)|, f = 2,..., r, are finite anda;(Am(f), ^ 2Mr. So by the above 
result of ] 


Lorentz 


(1963 1 we have a sequence (pmit) = 


m = 0,1,..of polynomials with positive coefficients of degree m such that 


\f{t)-^mit)\^2CrMrA:^{t), 0 ^ ^ 1 . 


(18) 


It is clear that 


Am{t) 


Since f{t)dt = 1, we have, by (19 1 , 

m 

1 — Cij{m + 1 ) ^ 2CrMrm~^/‘^. 


i=0 


(19) 


( 20 ) 
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Let Pmi = Cij Ya =0 Cj, i = 0, . . . , m, then fm{t) = YT=oPmilirni{t). It follows 


easily from (181 and (201 that ([T]l is true. 


□ 


A.2 Proof of Theorem |2] 

We will prove the assertion (i) only. The assertion (ii) can be proved similarly. 


Proof. The matrix of second derivatives of £r{p) is 


H{p) = 


d‘^£R{p) 

dpdp'^ 


Pm iS^i)Pm (®*) 

^ {'Ylj=oPrnjPmj{Xi)}‘^ 


E 


For any u = (mq, • ■ ■, UmY ^ as n —> cx), 




Ylj=Q UjPmjiX) 


n dpdp'^ ' \{YYjLQPmjPmj{X) ^ 

Clearly, /3mo{t), ■ ■ ■ ,Pmm{t) are linearly independent nonvanishing functions on 
[0,1]. So, with probability one, H{p) is negative definite for all p and sufficiently 
large n. By Theorem 4.2 of ] 


Redner & Walker 


(1984 1 , as s —oo, converges 


to the maximizer of £r{p) which is unique. 


□ 


A.3 Proof of Lemma|3] 

Proof. By ([^ and ( [T^ we know that under the condition of the lemma fm{t) = 
fm{t',Po) converges to f{t) at a rate of at least 0{m~^), i.e., 

f{t) = fm{t) + 0{m~Y, (21) 
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and, furthermore, since f{t) ^ <5, 


Cm = sup = 1 + 0(m ^), 

t I[i) 


( 22 ) 


uniformly in m. 

Let ui,..., be a sample from the uniform(0,l). By the acceptance-rejection 


method in simulation (Ross 2013), for each i, if Ui ^ fm{xi)/Cmf{xi), then Xi 


can be treated as if it were from fm- Assume that the data xi,..., have been 
rearranged so that the first Um observations can be treated as if they were from fm- 
By the law of iterated logarithm we have 


— 


I [uj^ 


i=l 


fmixi) 

cfixi) 


So we have 


= n — 0{nm ^) — 0 (^\/nm ^ log log , as.. 


(■r{p) = fm{Xi]p) = iR{p)+Rmn, 


i=l 


where £r{p) = “almost complete” likelihood and 

n 

Rmn= ^ \ogfm{xi-,p) = ^ log/m(xi;p). 


Ui>- 




Because 0 < 5 ^ f{t) ^ Mq, we have 0 < <5' ^ fm{xi',p) ^ Mg for some 
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constants 5' and Mq. By the law of iterated logarithm 
\Rmn\ = max{|log(5'|,|logMo|}^/ 

V CmjyXi)/ 

= max{| logj'l, I logMo|}(n - z/m) 

= 0{nm~^) + 0 nm~^ log log , a.s.. (23) 

The proportion of the observations that can be treated as if they were from fm is 

— = 1 — 0{m~^) — 0 (^'m~^\og logn/n^ , a.s.. 

So the complete data xi,... ,Xn can be viewed as a slightly contaminated sample 
from fyn- □ 


A.4 Proof of Theorem |4] 

Proof. The Taylor expansions of log fm{xj,p) at log fm{xj,po) yield that, for 


P £ 


^r{p) = '^^Og fmix j,p) 
i=i 

' fmiXj,p) - fmiXj,Po) _ 1 {fmiXj,p) - fmiXj,Po)}'^ 

fmiXj,Po) 2 {fmiXj,Po)V 


hipo) + 

J=1 


T Rmni 


where Rm,n = o(nr„), a.s.. 


Let p be a point on the boundary of ]Bm(?*n)> i-e., ||p — poll r = ^n- By the law 


of iterated logarithm we have 


fmixj,p) fmiXj,Po) / ] ] n 

> , - f ^ -= Oirn^nloglogn), a.s., 


i=i 


fmiXj,Po) 
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and that there exists rj > 0 such that 


^ {fni{Xj,p) - fm{Xj,Po)}^ 2 , 2 ^^ 

2 ^-- = Vnxn + Oir^Vnloglogn). 


i=i 

Therefore we have 


{fm{xj,Po)V 


^r{p) = 4(Po) + 

j = l L 


fm{xj,p) - fm{xj,poJ_ _ lijm{xj,p) - fm{xj,Po)}^ 

2 


fmiXjiPo) 


{fm{Xj,Po)Y 


1 


= ^i?(Po) - + 0(r^A/nloglogn) + 0{rn\/n loglogn) + o{nrl), 


Since m = 0(n^/^), nm ^ = o{nr‘^). So there exists rj' > 0 such that inip) ^ 
(■R{po) — r]'nr1^ = ^R{pQ) — r]'{log nY. S'mcQ d‘^iB{p)/dpdp'^ < 0, the maximum 
value of Ir{p) is attained by some fm{',PR) with pR being in the interior of 
Bm(r’n)- 

□ 


A.5 Proof of Theorem |5] 

Proof. It is easy to see that ( [TT] ) and ( [T^ follow from Theorem the bound¬ 
edness of /, and the triangular inequality. □ 


A.6 Proof of Theorem in 


Proof. By (21 1 we have 


Oi = 0mi{Po) + 0{m ’'Ati), 


(24) 


+ o{nrl) 

a.s.. 
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where 


r^i 

Oi= f{x)dx, 6mi{po) 

J ti—\ 



fm{x;po)dx, i = l,...,N. 


Because 0 < <5 ^ f{t) ^ Mq we have 


c = max 
l€i€N 


dmi (Po) 


= l + 0{m-^) 


(25) 


uniformly in m. Assume that yi,..., is a random sample from the discrete 
distribution with probability mass function 9i = P{Y = i),i = 1,... ,N. 

Let ui,..., Un be a sample from the uniform(0,l). For yi 6 = {6i ,..., 6n), 
i = 1,..., n, let Uj ~ [7(0,1). If Ui ^ c~^6^yJ6yi, then y, can be treated as if it 
were from 0m(Po) = (('mi(Po), • • •, ('mAr(Po))- Assume that the data yi,...,yn 


have been rearranged so that the first i/m observations can be treated as if they were 
from Om{po)- By the law of iterated logarithm we have 


— 




i=l 


dmyi (Po) 


cO. 


Vi 


= n — 0{nm ^) — 0 ^\/nm ^ log log , a.s.. 


So we have 


N 

^g(Po) = '^rii log 0mi{po) = iciPo) + Rmn, 
i=l 


where 


iciPo) = EE I{yj = i) log 9mi (Po) = log 9mi{Po)i 

j=li=l j=l 
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ni = #{j ; Vj = i}, fii = Y!jZi Hvj = i),i = 1 ,- ■ ■ ,N, and 

N N 

Rmn = ^ ^Og 6 mi = '^{ni - fli) logOmi- 

i=\ i=\ 

It is clear that there exist 5 ' > 0 and Mq > 0 such that 5 'At* ^ 9 mi ^ MQAti. By 
the law of iterated logarithm, 

\Rmn\ ^ max{| log( 5 'Ati|, I logAfgAfilKn - r-m) 

= 0 {nm~^) + 0 (\/nm~^ log log n') , a.s.. 


The Taylor expansions of log 6 mi{p) at log 6 mi{Po) yield that, for p G ]Bm(^n)> 
N 


~^ g { p ) = ^nilog6»mi(p) 


i=l 


N 


^g(Po) + 


2=1 


^mi (p) - ^mi (po) 1 {9mi{jp) &mi (po)}^ 


GmiijPo) 


{6'mi(Po)}^ 


T Rmni 


where Rmn = o(nr„), a.s.. 


Let p be a point on the boundary of ]Bm(?"n)> i-e., ||p — Polls = fn- It follows 
from the law of iterated logarithm that there exists rj > 0 such that 


N 


— ^= Vnrl + O(r^\/loglogn/n). 


2 = 1 


{6'mi(Po)}^ 


Therefore 


N 


Ig{p) ='^g(Po) + X] 


Uj 


2=1 


^mii,P) ^mi(Po) 1 {^mi(p) ^mi(Po)} 

' ~ 2 ' 


0mi{Po) 


{Omi{Po)V 


+ oinr^, 


= ^g(Po) - + O(r^\/loglogn/n) + 0 (rn\/nloglogn) + o{nrl), a.s.. 
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Since m = 0(n^/^), nm ^ = o(nr^). So there exists r]' > 0 such that icip) ^ 
^g(Po)—= ^g(Po) —V(logn)^. Since d'^icip)/dpdp'^ < 0, the maximum 
value of icip) is attained by some pc in the interior of Bm(rn)- 

□ 


A.7 Proof of Theorem 0 

Proof. By ( fTS] ) and the Taylor expansion we have 

Up - Poll! = ||p-po||G + 0(maxAfi) [ 

* Jo 

+ max I(f) 10 (max Atf). (26) 

By Pinjit) = {m + l){f3ra-i,j-i{t) “ /3m-i,j{t)}, we have 

^ m^|C'i-v/'0m(f) + C'2V’m(f)|- 


It follows easily from/3((,j(f) = m(m+l){/3m-2j-2(f)-2/3m-2,i-i(f)+/3m-2j(f)} 


that |V'm(f)| ^ C^mf. Thus by (261 we can obtain 


IIp-PoIIr- IIp-PoIIg = 0(maxAfi)(3(m^)||p-po||it + 0(m^)0(maxAf-). 


2 2 

Therefore we have \\p — po\\j^ = \\p — Po|Ig + 0(m'^rnaxi Af^). The proof is 
complete. □ 
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A.8 Proof of Theorem |8] 


Proof. Similar to the proof of Theorem]^ ( [T4| and ( [T5| ) follow easily from Theo¬ 
rems]^ and |7] (|22|), the boundedness of /, and the triangular inequality. □ 
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